Maximum rank correlation training for statistical machine translation

نویسندگان

  • Daqi Zheng
  • Yifan He
  • Yang Liu
  • Qun Liu
چکیده

We propose Maximum Ranking Correlation (MRC) as an objective function in discriminative tuning of parameters in a linear model of Statistical Machine Translation (SMT). We try to maximize the ranking correlation between sentence level BLEU (SBLEU) scores and model scores of the N-best list, while the MERT paradigm focuses on the potential 1best candidates of the N-best list. After we optimize the MER and the MRC objectives using an multiple objective optimization algorithm at the same time, we interpolate them to obtain parameters which outperform both. Experimental results on WMT French–English data set confirm that our method significantly outperforms MERT on out-of-domain data sets, and performsmarginally better than MERT on in-domain data sets, which validates the usefulness of MRC on both domain specific and general domain data.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Correlation of Machine Translation Evaluation Metrics with Human Judgement on Persian Language

Machine Translation Evaluation Metrics (MTEMs) are the central core of Machine Translation (MT) engines as they are developed based on frequent evaluation. Although MTEMs are widespread today, their validity and quality for many languages is still under question. The aim of this research study was to examine the validity and assess the quality of MTEMs from Lexical Similarity set on machine tra...

متن کامل

Low cost Portability for statistical machine translation based on n-gram frequency and TF-IDF

Statistical machine translation relies heavily on the available training data. In some cases it is necessary to limit the amount of training data that can be created for or actually used by the systems. We introduce weighting schemes which allow us to sort sentences based on the frequency of unseen n-grams. A second approach uses TF-IDF to rank the sentences. After sorting we can select smaller...

متن کامل

A Systematic Comparison of Training Criteria for Statistical Machine Translation

We address the problem of training the free parameters of a statistical machine translation system. We show significant improvements over a state-of-the-art minimum error rate training baseline on a large ChineseEnglish translation task. We present novel training criteria based on maximum likelihood estimation and expected loss computation. Additionally, we compare the maximum a-posteriori deci...

متن کامل

Minimum Error Rate Training in Statistical Machine Translation

Often, the training procedure for statistical machine translation models is based on maximum likelihood or related criteria. A general problem of this approach is that there is only a loose relation to the final translation quality on unseen text. In this paper, we analyze various training criteria which directly optimize translation quality. These training criteria make use of recently propose...

متن کامل

Refined Lexikon Models for Statistical Machine Translation Using a Maximum Entropy Approach

Typically, the lexicon models used in statistical machine translation systems do not include any kind of linguistic or contextual information, which often leads to problems in performing a correct word-sense disambiguation. One way to deal with this problem within the statistical framework is using maximum entropy methods. In this paper, we present how to use this information within a statistic...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011